NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SDA: Low-Bit Stable Diffusion Acceleration on Edge FPGAs

https://doi.org/10.1109/FPL64840.2024.00044

Yang, Geng; Xie, Yanyue; Xue, Zhong Jia; Chang, Sung-En; Li, Yanyu; Dong, Peiyan; Lei, Jie; Xie, Weiying; Wang, Yanzhi; Lin, Xue; et al (September 2024, IEEE)

Full Text Available
VindLU: A Recipe for Effective Video-and-Language Pretraining

https://doi.org/10.1109/CVPR52729.2023.01034

Cheng, Feng; Wang, Xizi; Lei, Jie; Crandall, David; Bansal, Mohit; Bertasius, Gedas (June 2023, IEEE)

Full Text Available
Vision Transformers are Parameter-Efficient Audio-Visual Learners

https://doi.org/10.1109/CVPR52729.2023.00228

Lin, Yan-Bo; Sung, Yi-Lin; Lei, Jie; Bansal, Mohit; Bertasius, Gedas (June 2023, IEEE)

Full Text Available
TVQA: Localized, Compositional Video Question Answering

Lei, Jie; Yu, Licheng; Bansal, Mohit; Berg, Tamara L. (January 2018, Empirical Methods in Natural Language Processing)

Recent years have witnessed an increasing interest in image-based question-answering (QA) tasks. However, due to data limitations, there has been much less work on video-based QA. In this paper, we present TVQA, a largescale video QA dataset based on 6 popular TV shows. TVQA consists of 152,545 QA pairs from 21,793 clips, spanning over 460 hours of video. Questions are designed to be compositional in nature, requiring systems to jointly localize relevant moments within a clip, comprehend subtitle-based dialogue, and recognize relevant visual concepts. We provide analyses of this new dataset as well as several baselines and a multi-stream end-to-end trainable neural network framework for the TVQA task. The dataset is publicly available at http://tvqa.cs.unc.edu.
more » « less
Full Text Available

Search for: All records